Picture for Zhongxiang Dai

Zhongxiang Dai

CASTLE: A Comprehensive Benchmark for Evaluating Student-Tailored Personalized Safety in Large Language Models

Add code
Feb 05, 2026
Viaarxiv icon

Workflow-R1: Group Sub-sequence Policy Optimization for Multi-turn Workflow Construction

Add code
Feb 01, 2026
Viaarxiv icon

Real-Time Aligned Reward Model beyond Semantics

Add code
Jan 30, 2026
Viaarxiv icon

UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models

Add code
Nov 12, 2025
Viaarxiv icon

EduAgentQG: A Multi-Agent Workflow Framework for Personalized Question Generation

Add code
Nov 08, 2025
Viaarxiv icon

ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment

Add code
May 25, 2025
Viaarxiv icon

Convergence Rates of Constrained Expected Improvement

Add code
May 16, 2025
Viaarxiv icon

Active Human Feedback Collection via Neural Contextual Dueling Bandits

Add code
Apr 16, 2025
Viaarxiv icon

Online Clustering of Dueling Bandits

Add code
Feb 04, 2025
Figure 1 for Online Clustering of Dueling Bandits
Figure 2 for Online Clustering of Dueling Bandits
Viaarxiv icon

Refining Adaptive Zeroth-Order Optimization at Ease

Add code
Feb 03, 2025
Viaarxiv icon